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Large-scale computer experiments are becoming increasingly im- 
OQ \ portant in science. A multi-step procedure is introduced to statisti- 

cians for modeling such experiments, which builds an accurate inter- 
polator in multiple steps. In practice, the procedure shows substantial 
improvements in overall accuracy, but its theoretical properties are 
not well established. We introduce the terms nominal and numeric 
error and decompose the overall error of an interpolator into nomi- 
,J^ , nal and numeric portions. Bounds on the numeric and nominal error 

^ ■ are developed to show theoretically that substantial gains in overall 

accuracy can be attained with the multi-step approach. 
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1. Introduction. Computer experiments use complex mathematical mo- 
dels implemented in large computer codes to study real systems. In many 
(^ ■ situations, a physical experiment is not feasible because it is unethical, im- 

C^ I possible, inconvenient or too expensive. A mathematical model of the system 

^ ■ can often be developed and input/output pairs can be produced with the 

help of computers. Typically, the input/output pairs are expensive in the 
sense that they require a great deal of time and computing to obtain and 
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pi . they are nearly deterministic in the sense that a particular input will pro- 

duce almost the same output if given to the computer experiment on another 
occasion. Computer experiments are widely used in systems biology, engi- 
, , , neering design, computational biochemistry, climatology and epidemiology 

r> ■ and their pervasiveness in science, engineering and medicine is only growing. 



Received May 2011; revised September 2011. 

^Supported by Singapore National Medical Research Council Grant IRG10nov006 and 
an NUS Initiative to Improve Health in Asia, Global Asia Institute grant. 

^Supported by NSF Grant CMMI 0969616, NSF Career Award DMS-10-55214 and an 
IBM Faculty Award. 

AMS 2000 subject classifications. Primary 41A17; secondary 65M12, 65G50. 

Key words and phrases. Computer experiment, emulation, interpolation, Gaussian pro- 
cess, large-scale problem, multi-step procedure, numerical technique, radial basis function, 
reproducing kernel Hilbert space. 



This is an electronic reprint of the original article published by the 
Institute of Mathematical Statistics in The Annals of Statistics, 
2011, Vol. 39, No. 6, 2974-3002. This reprint differs from the original in 
pagination and typographic detail. 

i 



2 B. HAALAND AND P. Z. G. QIAN 

When using a computer experiment to study a real system, a thorough explo- 
ration of the surface is typically wanted. However, obtaining input /output 
pairs is often too expensive for a complete exploration. A solution is to eval- 
uate the computer experiment at several well-distributed data sites given 
by a space-filling design [9, 21, 32, 34, 37, 38, 55]. Then build an interpola- 
tor which can be used as a stand-in, or emulator, for the actual computer 
experiment. The thorough exploration of the complex surface can then be 
carried out on the interpolator. Excellent overviews on data collection and 
modeling for computer experiments can be found in [6, 8, 23, 48-50]. 

To emulate the output from a computer experiment, Gaussian process (GP) 
models or reproducing kernel Hilbert space (RKHS) interpolators are often 
used. These interpolators have a simple form and control the smoothness of 
the emulator. In particular, let / denote the output of a run of the com- 
puter experiment, so that the functional link between input x and output y 
is y = f{x). Take $ : O x [7 — ^ R to be symmetric in its two arguments and 
positive definite. The kernel $ is positive definite on a domain of interest Q if 



EE' 



aiaj^{xi,Xj) > 
i=i j=i 

for every nonzero a € M" and distinct {xi, . . . , Xn} C Q. Then, given distinct 
input sites X = {xi, . . . ,x„}, the GP or RKHS interpolator has the simple 
form 

n 

where a has Axa = f\x, Ax = {^{xi,Xj)} and f\x = {f {xi) ■ ■ ■ f {xn))' ■ As- 
sociated with each symmetric, positive definite kernel is exactly one Hilbert 
space of functions whose norm, in the case that the kernel is smooth, mea- 
sures both size and smoothness. For a particular kernel <I>, this associated 
function space will be called its native space and will be denoted ^#(0). 
Native spaces will be discussed further in Section 5. The smoothness of the 
emulator is controlled in the sense that the RKHS interpolator has the small- 
est possible native space norm of any function interpolating f\x [10, 59, 60]. 
It is worth noting that the GP models often used in practice to build em- 
ulators for computer experiments are essentially a special case of RKHS 
emulators. In the GP context, the kernel $ is a, possibly scaled, correlation 
function. In the case that a nonzero mean function jl is estimated in the 
GP model, the interpolator is actually the sum of this estimated mean func- 
tion and an RKHS interpolator of the residual (/ — jl)\x- Here, we consider 
translation invariant, or stationary, kernels so that <^ is a function of only 
the difference between its arguments. Hereafter, <I>(x,y) will be written as 
^{x — y). Note that the connection between Gaussian processes and RKHS 
was also discussed in [59]. 
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Many of the systems which scientists, engineers and medical researchers 
use computer experiments to study exhibit extremely complex behavior in 
portions of the input space. To discover and understand these regions re- 
quires a large-scale computer experiment with many input sites which are 
potentially very near one another. Unfortunately, most methods for build- 
ing emulators, including RKHS and GP interpolators, suffer from increas- 
ingly poor predictive accuracy due to numerical problems as the number 
of observations of the computer experiment becomes larger. Throughout, 
we refer to large-scale computer experiments as those with a large num- 
ber of runs. Such experiments appear frequently in various fields such as 
aerospace engineering [4], information technology [20], biology, high-energy 
physics, nanotechnology and security. The essential difficulty in emulation of 
a large-scale computer experiment is that as input sites become nearer to one 
another the problem of finding an interpolator becomes ill-conditioned and 
so less amenable to accurate calculation. Several techniques exist for numer- 
ically stabilizing kernel-based interpolators, including adding a nugget effect 
[27, 50], using compactly supported kernels [10, 13], covariance tapering [22], 
decomposing the correlation matrix [5] and approximating likelihoods [53]. 
The multi-step procedure [12] described below also addresses the vital issue 
of numerical stability and can be used alone or in concert with additional 
numeric measures such as those mentioned above. 

The multi-step procedure is not new to the field of applied mathematics, 
yet the exposure of statisticians to this method is relatively limited. Further, 
while the procedure often improves overall predictive accuracy substantially 
in practice, minimal work has been done on its theoretical properties [10]. 
Notable exceptions include [33], [11] and [16]. The existing theoretical work 
in the literature examines numerical accuracy in a relatively qualitative 
manner. Here, we introduce the concepts of nominal and numeric accu- 
racy. Nominal accuracy refers to the accuracy which would be attained if 
computations could be performed without floating point rounding. Numeric 
accuracy refers to how close computed quantities are to their correspond- 
ing nominal counterparts. Then, we introduce a decomposition of the error 
of an interpolator into nominal and numeric portions. This gives a com- 
plete description of the computed interpolator's error while separating the 
contributing sources of error to allow for more straight-forward analysis. 
Bounds on the numeric and nominal error of the multi-step interpolator are 
developed. The numeric bound is the only complete, rigorous bound on the 
numeric error of the multi-step interpolator. The result is very general and 
makes very few assumptions about the kernels used in different steps. The 
nominal bound is similar to the error bound developed in [33], but more 
general in that it allows the kernels at different stages to be re-scaled in 
a flexible manner. In practice, the kernel re-scalings can have a large impact 
on accuracy. 
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2. Multi-step interpolator. The multi-step procedure explored here is 
a generalization of the procedure introduced in [12] . Their idea was to form 
well-spread nested subsets of the data. Then interpolate the first subset us- 
ing a wide kernel and form residuals of this interpolator on the next subset. 
The residuals are then interpolated using a narrower kernel and the cur- 
rent stage, and previous stage interpolators are added together, giving an 
interpolator on the larger subset. This procedure is repeated an appropriate 
number of times, at each stage updating the interpolator, until an inter- 
polator of the complete data is obtained. We introduce a separation of the 
error into nominal and numeric portions and derive bounds on each type of 
error. We adopt a slightly different notation than [12] . Let / denote the un- 
known function to be interpolated and J7 C M"^ denote the domain of interest. 
Throughout, the following assumption is made about the kernel <1>. 

Assumption 1. The kernel <l> is continuous, positive definite and trans- 
lation invariant. 

Note that with minor modifications, the development and results in Sec- 
tions 1-4.3 only require that $ is positive definite. 

In the below description of the multi-step interpolation procedure, J de- 
notes the number of stages, and $j denotes the kernel used for interpolation 
in stage j. Now, take 

(1) Xi(Z---(ZXj = X 

and initialize V^ = 0. Then, for j = 1, . . . , J, let 



V\x) = )^ai^j{x-x^), 






u=l 




( ^'^ \ 






«^=^xU f-Y.v- 


? 




\ k=0 / 


^j 




^Xj,1>j={^j{Xu-X^)}, u,v=l,. 


.,nj, 


rij = card Xj . 







(2) 



Then the multi-step interpolator, 

J 
(3) p(x) = ^P^(x) 

i=i 
satisfies the interpolation conditions V{xu) = f{xu), u= 1, . . . , n, where n = 
cardX. Here, X is the complete set of input sites. The results in this article 
indicate that the best performance will be achieved if each of the nested 
designs, Xi, . . . ,Xj, are chosen to have well-separated data sites, uniform 
low-dimensional projections and small data-free regions. Note that a^ should 
not be calculated using the formula A^ $.(/ — Sfc=o'^^)l^j' ^^^* instead 
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as the solution to the hnear system Ax^^^^a^ = if ~ 'l2k=o^''^\^3- ^^ gen- 
eral, the solution to the linear system is subject to smaller numeric error. 
Also, in the situation where n is large and ^x ,<J> is sparse due to memory 
constraints, AZ , will often be too dense to be stored. 

It is commonly the situation that each kernel $j depends on parame- 
ters Qj. For example, in Section 5 it is assumed that ^j is a known ker- 
nel ^j whose inputs x — y are re-scaled by a matrix Qj, so that ^j{x — y) = 
^j{Qj{x — y)). The form of the underlying kernels ^j is often fixed in ad- 
vance to achieve an interpolator with prespecified smoothness and numer- 
ical properties. In particular, the results in Sections 4 and 5 indicate that 
smoother underlying kernels have better nominal properties and worse nu- 
meric properties, as defined in (8), and vice versa. The accuracy of the in- 
terpolator can depend significantly on the choice of parameter values. A few 
possible criteria for choosing the parameters Qj are cross-validation, maxi- 
mum likelihood and sparsity of the interpolation matrices. Most procedures 
for choosing the Qj are simplified by considering each stage sequentially. In 
particular, Qj can be chosen to minimize the cross-validation error, maxi- 
mize the likelihood or restrain the number of nonzero entries in the interpo- 
lation matrix Ax ,$ at stage j. For smaller problems, where a dense A'^ ^ 
can be stored, the short-cut formula in (34) can be used to make leave- 
one-out cross-validation computationally efficient. For larger problems, an 
option such as 10-fold cross-validation is more appropriate. If the residuals 
from the previous stage (/ — X] 1=0^*^ )!-''' ^'^^ modeled as a GP, then max- 
imum likelihood can be used to choose the parameters Qj. Maximizing the 
likelihood at each stage is equivalent to minimizing 

~i 
(4) rij log 




a- 



X, 



+ logdet(Ax,,$, 



Restricted maximum likelihood estimates can be obtained by replacing the rij 
in the objective function (4) by Uj — rij-i, with uq = 0. For large problems, 
a storage and computation efficient algorithm such as [2] should be used 
in calculating logdet(Ax ,$ )• For very large problems, memory constraints 
demand that the sparsity of ^x ,$ be considered. One possibility for com- 
pactly supported kernels is to choose fixed Qj to ensure that the number of 
nonzero entries in j4x ,<j> is manageable as in (35). Another possibility is to 
incorporate a penalty for nonsparsity into the objective function such as (4). 
If the error at stage j, f — J2k=o'^''^ ^^ modeled as a GP, then confidence 
intervals on the function's values f{x) can be obtained in much the same 
manner as a single stage interpolator [58]. In particular, model the output as 

J 
/(x) = ^Z,(x), 
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where the Zj are mean zero Gaussian processes with Cov(Zj(xi), Zj(x2)) = 
iT?<&j(xi — X2)- Note that the Zj are not independent. For point sets X 
and y, denote the cardX x cardy matrix of pairwise kernel evaluations of 
points in X and Y as 

(5) ^{X-Y) = {^{xu-y.)], 

where x„ G X, y„ G Y . Take Zq = to simplify the development below. 
Conditional on f\xj,Zi, . . . , Zj^i, 

J-i / / J-i ^ 

fix) - Y, Z,{x) ~ M ^j{Xj - x)'A~^\„^ [f-Y.^1 

3=0 \ \ i=0 / X,, 



a]{^m - ^j{Xj - x)'^^i $(Xj - x)) 



(6) 



J"i 



f{x)^MUj{Xj-x)'A-^\„[f-Y,Z, 



J-i 



+ Y,H^)^ 



Xj 3=0 



aj($j(0) - ^j{Xj - x)'A-^\„^^{Xj -x))\. 



Let Xj = {Xj, x}. Then, conditional on /|xj, ^i, • • • , ^j-i 

^ilx, ~ AAU,(X, - Xj)'A-^l^^ if - 'y^Z, 
(7) 



fe=0 



X, 



<t2($,(Xj - Xj) - $,(X, - Xj)'^^l ^,$,(X, - Xj)) 



Note that the distribution in (7) is singular and <^j {Xj — Xj) = A-^ ^ in the 

notation of (2). The first rij components of these conditional distributions 
are trivial and given by 



k=0 



J = l,. 



.J. 



X, 



The remaining nj — rij + 1 components have the nontrivial distribution, 
conditional on f\x,,Zi,. . . ,Zj-i, given by 

Zj\xj\x, ~ 



Afi <^j{X,-Xj\XjyA~]„if-Y,Zk 



k=0 



Xi 



a]{^,{Xj\X,-Xj\X,)-<^,iX,-Xj\X,)'A^]„^^j{X,-Xj\Xj))]. 
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After estimates of the a'j and any parameters in the ^j have been plugged 
in, the results in (6) and (7) can be combined to obtain a Gaussian estimated 
predictive distribution for f(x) conditional on f\xj with mean given by (3). 
For generating confidence intervals, the variance of the estimated predictive 
distribution, conditional on f\xj, can be calculated in a backwards recursive 
manner using (6) and (7). Once again, note that A~^b should be taken as 
shorthand for the solution to the linear system Ax = b. 

3. Nominal and numeric error. Now, we develop some intuition for why 
the multi-step procedure can improve accuracy in many situations in prac- 
tice. First, computed quantities, which are subject to floating point error, 
are distinguished from the idealized quantities that could be obtained if 
a computer performed calculations with full accuracy. Hereafter, computed 
quantities will be distinguished with a tilde, such as y. We introduce the 
following separation of error into nominal and numeric portions: 

|/(x) - r{x)\ = \f{x) - V{x) + V{x) - V{x)\ 

(8) 

<\f{x)-V{x)\ + \V{x)-V{x)\. 

Note that the absolute values in inequality (8) can be replaced with the 
norm of one's choosing. It is necessary to account for both nominal and nu- 
meric error since the trade-off between the two is very important. In most 
situations, reducing one will increase the other. The following proposition 
shows that the native space norm of the nominal error is always reduced by 
the addition of new data sites. Throughout, let M^{Q) denote the reproduc- 
ing kernel Hilbert space corresponding to the positive definite kernel $, and 
let II • llAr^(n) denote the norm on that space [1]. 

Proposition 3.1. /// G7V<j,(r2) and Xi C X2, then 

ll/-^2||A/i(n) < II/-^i|Ia/'<i,(Q), 

where V\ and V2 denote the single-stage interpolators on the sets Xi and X2, 
respectively. 

Proof. It can be shown that the interpolator is orthogonal to its error 
with respect to the native space inner product. This implies that the result 
holds if and only if 

|2 ||T> l|2 ^ II ^||2 11^ ||2 



lUs(n) ~ 11^211^^4,(0) - ll/llA/'<s,(f7) ~ ll^illAr$(n) 
^^ 11^211^,(0) >ri|li/-,(n) 

where the last equivalent condition follows from the definition of the native 
space norm and the fact that a^ = A^ ^f\xj for ^Xj,$ = {^{xu — x^)}, 
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Xu,Xv G Xj, j = 1,2. Then, write the interpolation matrix Ax2,^ as 

^^-* = V ^21 A,, 

where A12 = $(Xi - X2 \ Xi), ^21 = ^(^2 \ ^1 - ^1), and ^22 = ^(^2 \ 
Xi — X2 \ Xi), using the notation in (5). Using partitioned matrix inverse 
and binomial inverse results [18], it can be shown that 

+ (/UaVXi - ^21^Xi,$/Ui)'^22-l(/U2\Xi " ^21^Xi,*/Ui )' 

where ^221 = ^22 — ^2i^x $^12- Since ^^2-1 ^^ ^ block on the diagonal 
of A'^ ^, it must be positive definite and the result follows. D 

On the other hand, the numeric error can become arbitrarily large by the 
addition of new data sites. Throughout, let Amax(^) and Amin(^) denote 
the maximum and minimum eigenvalues, respectively, of a positive definite 
matrix A. Note that \mm{Ax,^) — )■ as min^^^^a;^ ||xu — Xv\\2 — )■ 0. There- 
fore, Amax(-43^|$) -;> 00 as minj.^^2.J|a;„ - x„||2 -^ 0. An unboundedly large 

maximum eigenvalue of A'^ ^ can enormously amplify small errors in the 
function and kernel evaluations. Consider the numeric error of the interpo- 
lator at a new point x, 



n 



V{x) - V{x) = ^[ai^{x - Xi) - ai$(x - Xi)] 

i=l 

n 

= X][("« ~ oii)^{x - Xi) - ai{^{x - Xi) - $(x - Xi))]. 

i=l 

Let e^ = a — a and e = $(X — x) — (^{X — x) using the notation in (5). 
Then 

V{x) - V{x) = X][er$(x - X,) - {ai - et)ef] 

i=l 
n 

= ^[4Hx-Xi)- Uief+e'^ef]. 

So, 

(9) \V{x)-P{x)\ > \f\'xA^]^e^\ - We'^himx - X)h - Ik^lblk^lb 

since Ax,^a = f\x- If, for example, e* is proportional to the eigenvector cor- 
responding to Xiam{Ax,^), and /|x is not orthogonal to e*, then the right- 
hand side of (9) can be made unboundedly large by taking Amin(^x,$) — ^ 0. 
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Fig. 1. Panels 1-3: interpolator in solid blue and actual function in dotted black with 
collected data indicated by black dots. 



This phenomenon can be illustrated by attempting to build an interpola- 
tor for the function 

f{x) = exp{(x + 1/2)2} sin(exp{(x ^ i/2)2}) 

shown in Figure 1 using the Gaussian kernel 

^[x -y)= exp{-(a; - yf}. 

Interpolators, shown in blue, are built on 11, 21 and 81 evenly spaced data 
points, shown in black dots, in the respective panels of Figure 1. As the 
density of points increases, so does the numeric error. 

Suppose that one is in the situation where most of the data sites are well 
spread, but a few poorly separated data sites are causing small numeric er- 
rors to be amplified. Consider forming an interpolator in two stages. In the 
first stage, remove the data sites which are causing the ill-conditioning of 
the interpolation matrix and interpolate the remaining points with a rela- 
tively wide kernel. The nominal error will be only slightly larger than the 
error for the full data set, since the removed data sites were nearly equal 
to data sites which were included. However, the numeric error will be sub- 
stantially less than that of an interpolator formed on the full data set. In 
the second stage, interpolate the residuals from the first-stage interpolator 
using a kernel which is narrow enough that numeric errors remain small. The 
second-stage interpolator will increase neither the nominal accuracy nor the 
numeric error substantially. When the two interpolators are added together 
to form the multi-step interpolator, the nominal accuracy may be slightly 
worse, but the numeric accuracy will be very much better. 

For example, consider building an emulator for the Michalewicz function 

/(a;,y) = sin(7rx)sin (vrx ) -|- sin(7ry) sin (27ry ) 



10 



B. HAALAND AND P. Z. G. QIAN 




1 

OS 

o.e 

04 
02 





NWHKWHWP 



[ j< jf 4S j; 4^4^ j^ V^ jT jc j« J[ iS j« J< JC, 



i:« 4£f<4£ « j^li: j( « ^ ^ #: ^ ^4f ^# 
G.4 :^^c^^ifJC^t^^c4tl^t*^■*^■lf 



OS 



SSI 



Fig. 2. Panel 1: the Michalewicz function. Panels 2-4 in clockwise order: 925 point data 



sets with separation distances 0.017, 0.009 and 5 x 10 



respective 



using the third 925 point data set in Figure 2 with separation distance 
5 X 10~^^. The separation distance of a point set X is half the distance 
between the closest two points, 



(10) 



1 

Z : 



mm 



\Xi 



^j\\2- 



Clearly, the x 's do not contribute much information about the unknown sur- 
face. If an ordinary Gaussian kernel interpolator, corresponding to a single 
stage with 



(11) 



)(x -y)= exp< - ^ ej{xj - yjf 



i=i 



is built using all the data sites, the best possible mean squared prediction 
error over values of 61,62 is ~0.15, the square of the function's L2 norm. 
This is because the kernel must be very narrow, or the interpolation matrix 
will be nearly singular. Throughout, the term mean squared prediction error 
is taken to be the average prediction error over the input domain. If, on the 
other hand, the -'s are interpolated and then the residuals on the x's are 
interpolated, corresponding to two stages, the best possible mean squared 
prediction error over values of ^1,^2 at each stage is ~1.5 x 10~^. 

4. Numeric accuracy. The numeric accuracy of the multi-step interpola- 
tion procedure depends on the accuracy of floating point matrix manipula- 
tions. Floating point accuracy refers to the fact that computers do not per- 
form calculations with real numbers, but instead with rounded versions 
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thereof. For example, a typical computer has 15 digits of accuracy meaning 
that 



IfI|2 

where x denotes the actual value, and x denotes the value that the computer 
stores. 

4.1. Numeric accuracy of matrix inversion. The following lemma on the 
accuracy of floating point matrix inversion is a combination and generaliza- 
tion of results in [14] . 



Definition 2. The matrix 2-norm || • II2 is defined as ||^||2 = \/^max{A'A). 

Lemma 1. Suppose Ax = b and Ax = b with \\A — A\\2 < 5a||^||2, ||^ — 
b\\2 ^ "^bll^lb <ind n{A) = rj^A < 1/<5a for some SA,Sb > 0. Then A is non- 
singular, 

\\x\\2 ^ l + r{6b/6A) 



uc 2 1 — r 

(12) „ 

^=^<^-^±^^{A), 

\\x\\2 l — r 

where k{A) = ||A||2||A"^||2. 

Proof. Suppose A is singular. Then there is a ?/ 7^ with Ay = so 
(/ — A~^A)y = y. This implies ||/ — yl~^yl||2 > 1. On the other hand, the 
conditions ||^ - A\\2 < 6a\\A\\2 and k{A) < 1/6a imply \\I - A~'^A\\2 < 1 
giving a contradiction. 

Now, Ax = b implies A~^Ax = A~^{b - {b-b))=x + A~^{b - b). The 
condition ||/ — 74~"'^y4||2 < r implies ||A~^74||2 > 1 — r and in turn 

||X||2<Y^(||X||2 + P"l||2||6-6||2) 

<-^^{\\x\\2 + 6,\\A-%\\b\\2) 
< p 2 + ^■ 



l-rV ^a\A\ 



<T^(lkll2 + r(5fe/5A)||x||2), 
l—r 

where the first inequality follows from the stated condition, the triangle 
inequality, and the fact that Ui?y||2 < ||-B||2||?/||25 the second inequality fol- 
lows from the condition ||6 — 6||2<5fe||6||2, the third inequality follows from 
the condition ^(yl) = r J^a and the final inequality follows from ||6||2 < 
ll^lbll^^lb- Dividing by ||2;||2 gives the first inequality in (12). 
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Note that A{x -x) = b-b- {A- A)x. So, 

\\X-X\\2< p~^||2||6-6||2 + ||vl"^||2p-^||2||x||2 
<6b\\A~%\\b\\2 + dA\\A-%\\A\\2\\x\\2 

<6bK{A)Jj^+6A'iiA)\\x\\2 

\\^\\2 

< k;(^)||x||2 Ob + OA- 



1 — r 

where the first inequaUty fohows from the triangle inequahty and the fact 
that J|Sy||2 < ||-B||2||y||2) the second inequality follows from the conditions 
\\b — b\\2 < (56116112 and ||^ — A\\2 < (5yi||^||2, the third inequality follows from 
the definition of k.{A) and the final inequality follows from the fact that 
||6||2 < ll^lbll^^lb and the first inequality in (12). Dividing by ||x||2 and sim- 
plifying gives the second part of (12). D 

4.2. Numeric accuracy of single-stage interpolator. The above lemma 
can be used to bound the numeric error of an interpolator as follows. 

Theorem 4.1. Suppose that \\Ax,<^ - ^x,<j>||2 < <Ja||^x,<j>||2, \\f\x - 
f\x\\2<Sf\\f\x\\2, K(Ax,<^) = r/6A<l/6A and sup^^^gj^ |$(x - y) - $(x - 
y)\ < D6a for some 5A,Sf,D > 0, then 

iSA±6f) 
1-r 



\nx)-P{x)\ < \\f\x/V^ C,^J' giX,^), 



ft 

5(X,$) = — ^- -U{Ax,^)^Q)+D), 

where k(-) is defined in Lemma 1. 

Note that for large n and approximately uniform X, H/lx/v^lb 
where 



L2(!^)i 



L2{n) = \ /(x)2dx. 

Further, the assumption sup^, j^gQ|<&(x — y) — '^{x — y)| < D5a requires that 
the kernel is computed in a relatively accurate manner. 



Proof of Theorem 4.1. First, 

n 

Vix) - V{x) = ^[aj<I)(rE - Xi) - ai^{x - Xi 



i=l 

n 



^[(oj - ai)$(x - Xi) - dj(i(x - Xi) - $(x - Xi))]. 



i=l 
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So, 

\V{x)-V{x)\ 



< 



^(oj -aj)$(x-Xi) 



i=l 



+ 



^ ai{^{x - Xi) - $(x - Xi)) 



i=l 



Applying the Cauchy-Schwarz inequality to each term gives 

\V{x)-V{x)\ 



< \\a — a\ 



+ ll^lb^ 



1 



i=l 



X Xi ) 



J2i^i^ - Xi) - ^{x - x,)y 

\i=i 



The terms under the radicals can be bounded to obtain 

\V{x) - V{x)\ < V^\\a - ahHO) + V^WahDdA- 
Now, Lemma 1 can be applied to the coefficients, giving 



\V{x)-V{x)\ < V" 



+ \/n — a 2-D^A• 
l — r 

Noting that ||a||2 < ||^x ^Ibll/I^lb and collecting terms shows that 



\V{x)-V{x)\ < 



1 — r 



< 



X {{Sa + df)K{Ax,^MO) + D{5a + r6f)) 
V^\\Ax\\\2\\f\xh 



\-r 
Rearranging gives the result. D 



-(<5a + <5/)(k(Ax,*)$(0) + £'). 



4.3. Numeric accuracy of multi-step interpolator. The first numeric re- 
sult for the multi-step interpolator follows from Theorem 4.1. Here, 5 denotes 
the computer's floating point accuracy, typically 5 < 10"^^. 
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Theorem 4.2. Suppose that for j = 1,. . .,J, ||y4x ,<j> 



A 



X„*,||2 < 



Sj\\Axj,'S>j\\2, \\f\xj - f\x,h < ^ll/k,l|2, k(^x,,<j.,) < r/6j < l/S^ and 
sup^ y^Q\^j{x — y) — $j(x — y)\ < D5 for some 6j,5,D > with 6j\\{f — 
Ei=i'P')\xJ^h<6\\f\x,/V^\\2,then 
J J 



(13) 



<S\\f\x,j/V^h 



J M 

M=l i&Sj{M)k=l 

r), S.j{M) = {i G N^^+i : 1 < ii < • • • < iM < U/+i = J] 



where C = 2/(1 

p{X,Y) = \\f\x / y/nx\\2/\\f\Y / ^/nY\\2, o,nd g is defined in Theorem 4-1- 



The assumption 6j\\{f -Y,k=\'P'')\xj / y/njh <S\\f\xj/^/nj\\2 roughly re- 
quires that the nominal errors either shrink or are not much larger than the 
function values. In practice, combinations of functions and training data sets 
which do not meet this assumption are very rare. 

Proof of Theorem 4.2. The result can be shown using induction on 
the number of stages J. If J= 1, then the result follows immediately from 
Theorem 4.1. Take J > 2, and assume the result holds for J — 1 stages. Then 



(14) 



J-l \ 
j=l ) 


( 

- f 

Xj V 


J-l \ 
i=l / 


Xj 


2 


<ll/U,-/k,|l2 + 


/J-l J-l 


<m\x,\\2 


+ y/n.j 


7-1 J- 

3=1 j= 


1 
1 


ioc 



where the first inequality follows from the triangle inequality, and the second 
inequality follows from the assumptions and by bounding the Li error with 
the maximum error. The induction hypothesis can be applied to the final 
term in (14) giving the bound 

,15) *«^l-* 

/ J-l M \ 

X i+p(Xj_i,xj)j]c*^ Y. V^p^^^.^^^.^M^-^u^^^S]- 

In stage J, the error from the first J — 1 stages are interpolated on Xj . Af- 
ter multiplying and dividing the above bound (15) by ||(/ — ^^1^ '^^)\xj\\2-, 
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Theorem 4.1 can be used to bound the error due to stage J . Note that 
the term 6f m Theorem 4.1 is the above bound (15) divided by ||(/ — 



^J-l 



^ ■ j^ 'P'')|xj||2 and the term 5a in Theorem 4.1 is 5j. By assumption, 5j is 



^J-l. 



smaller than or equal to (15) divided by ||(/ — X^,=i 'P'')|xj||2- Simplification 
and coarsening of the bound gives 

\V\x)-V\x)\ 
(16) <5\\f\xJ^jh-^^9{Xj,^j) 

/ J-l M \ 

\ AI=1 ig5j_i(Af)fc=l / 



Now, 



J J 



< 



j-i 



j-i 



+ |P'^(x)-P'^(x)|. 



1-kJ 



i=i i=i 

So, the induction hypothesis can be applied again along with (16) giving 

<5\\f\xJV^jh 

J-l M 

M=l iG<Sj_i(Af)fc=l 

(17) 

+ C5(Xj,cl>j) 

+ Cp{Xj.i,Xj)g{Xj,<^j) 

J-l M 

M=l ieSj-i{M)k=l 

Note that the term in square brackets in (13) is the sum of the terms with 
iM < J and iu = J giving 

J M 

M=l i&Sj(M) k=l 

J-l M 

= E^'' E T{piXi,,x,^^Mx^,,^i.) 

M=l ieSj(M),iM<Jk=l 
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J M 



M=l i£Sj{M),iM=Jk=l 



J~l M 

n(Y. Y. \r,(Y. (\, 



M=l J6Sj_i(M)fc=l 

J M 

A/=2 i£Sj(M),iM=Jk=l 

which is exactly the term in square brackets in (17), proving the result. D 

4.4. Dependence on separation distance. The terms 
(18) 9{Xj,<Pj) = —-j -(k(Ax,,#,)<&(0) +Z)) 

from Theorem 4.2 can be computed, at least approximately. However, by 
bounding (18) in terms of the separation distance, as defined in (10), the 
role of the data sites and the kernel's smoothness in the numeric accuracy are 
revealed. These results indicate that using poorly separated data or a wide 
kernel $ with a rapidly decaying Fourier transform, implying more smooth- 
ness, has more potential to result in large numeric errors in interpolation. 
The Fourier transform can be defined as follows. 



Definition 3. For / G Li(M'^) define the Fourier transform [51] 
/(w) = (27r)-'^/2 /■ /(2;)e-^'^'^dx. 



To generate the bound on (18), the following result from [60] can be used. 

Theorem 4.3. Let Lp^{M,(^) =mi\\^\\^<2M^{'^)- Then 
Amm(^X,<t.) > Cd^^{Md/q,^)/q'^, 

Md = l2{TTr^{d/2 + l)/9)^/('^+^\ 
Cd = {Md/2yy/{2T{d/2 + l)) 
for any q <qx, where Ax,^ = {^{xi — Xj)}. 

To bound Amax(^x,<i>) below, Gershgorin's theorem [57] can be used. Ger- 
shgorin's theorem states that the largest eigenvalue of Ax^^ has 

n 
|Amax(^X,<I.)-^(2;j-Xj)| < E I'^i^i - Xj)\. 
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Rearranging and coarsening the bound gives 

(19) A„iax(^x,#)<n$(0). 

Theorem 4.3 and inequahty (19) can be combined to obtain the following 
theorem bounding (18). 

Theorem 4.4. Under the assumptions in Theorem J^.2, 

g{Xj,<^j) < Kupper(Xj, $j)(«:upper(Xj, $j)$(0) + D), 



l^wppci \Xj ) *Pj ) 



^jlx, 



CdMMd/qx,,<^j)' 



The nested sequence Xi C • • • C Xj in (1) with large separation distance 
can be generated from nested space-filling designs [15, 41-44], which were 
originally developed for the purpose of conducting multi-fidelity computer 
experiments. Space- filling designs have shown particular merit in numerical 
integration [28-31, 35, 36, 39, 40, 52, 55]. Theorem 4.4 provides new insights 
into the use of such designs in interpolation. 

5. Nominal accuracy. The results in this section indicate that the nomi- 
nal error in interpolation converges to zero more quickly for wider, smoother 
kernels <&, although the constant involved in this rate changes. This is in di- 
rect opposition to the numeric error, which tends to be smaller for narrower, 
less smooth kernels. In fact, it will be seen that convergence of the nominal 
error of an arbitrarily fast rate can be achieved with an infinitely smooth 
kernel, such as the Gaussian in (11). 

A re-scaling is introduced in the following definition. 

Definition 4. For a nonsingular 9, define $0(x) = $(6x). 

5.1. Point-wise bound. Initially, consider a single stage with a fixed <^ 
which is re-scaled by a fixed G. For a set of input sites X of size n, define 
the cardinal basis functions 



Ui{x) = ^/3i$e(x - Xj), 



i=l 



Ui{Xj) = l{i=j} 

for i,j = l,...,n. Then 



^(^) ='^fixi)Mx)- 



i=l 
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Since fix) = {f,^e{--x))j^^^^n-) if /GAA$e(0), 

n 



i=l 



= / /, $0(- - X) - ^Ui(x)$0(- - Xi) \ 

\ i=l / AA.I.0 (n) 

Now, the Cauchy-Schwarz inequality can be applied, giving the error bound 



(20) \f{x)-V{x)\<\\f\\^,n) 



<^ei--x)-^Ui{x)<l>ei- 



i=l 



A/$„ (Q) 



The second term on the right-hand side of (20) is the so-called power func- 
tion, -P$e,x- It can be shown [60] that if the domain of interest Q is bounded 
and convex, then 

-P|e,x < C'lll^e - pIIl^{b(o,C2/ix))' 

where Ci, C2 > are constants which may depend on 0, p is any multivariate 
polynomial, B{a,b) = {x G M"^ : ||x — a||2 <b} and hx denotes the fill distance 

hx = sup min ||x — Xu\\2- 

Now, if $ has k continuous derivatives, p can be taken to be the Taylor's 
polynomial of ^q of degree k — 1. Then 

ll^e -p\\L^{BiQ,C2hx)) ^ C3\\Q\\2hx, 

where C3 is a constant which does not depend on 0. Combining the above 
development gives the following. 

Theorem 5.1. Suppose that il. is bounded and convex, $ satisfies As- 
sumption 1 and has k continuous derivatives and is nonsingular. Then 

\f{x)-V{x)\<C^\\eff\f\\f\W^^^n). 

5.2. Native space bound. First, write <l>e * '^e as 

^e*'^e{x-y)= / <^e{x -t)^e{y -t)dt. 
Jn 

Then, for / G J^^q*^q{^) and x G fi, express / in terms of the integral 
operator 



/(x)= / u{y)<^e*'^e{x-y)dy, 
Jn 
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where u G L2{i^). Combining these expressions gives 

fix) = [ u{y) I $e(y - t)^e{x - t) dtdy 
Jn Jn 

v{t)^e{x-t)dt, 



In 
where v S L2{0,) is given by 

v{t)= / u{y)^e{y-t)dy 
Jn 

for t G O. Then 

ll/-^llAA<|.Q(f7) = (/-^J)A/-<i,g,(n) 

(21) ={f-V,v)L,in) 

< \\f -'P\\L2{n)\\v\\L2in), 

where the first equahty follows from the orthogonality of the interpolator and 
its error with respect to the native space norm, the second equality follows 
from the properties of the integral operator and the inequality follows from 
the Cauchy-Schwarz inequality. 

If <I> has k continuous derivatives, then the first term on the right-hand 
side of inequality (21) can be bounded using Theorem 5.1 as 

(22) 11/ - Vh^in) < V^^omWf - V\\l^ m 

where the first inequality follows by relating the L2{^) and Loo{0,) norms, 
and the second inequality follows by applying Theorem 5.1 to f — V. Plug- 
ging inequality (22) into inequality (21) and canceling a single ||/ — 'P||a/'$ (n) 
term gives 

(23) ll/-^lk,3(n)<C$l|e||f4/'||HlL,(n). 

Using the properties of the integral operator, the square of the second term 
on the right-hand side of inequality (23) can be expressed as 

ll^llLmi = / u{x)u{y)<^eiy-t)<^eix-t)dxdydt 

(24) ^"' 



llA/«©.*e(f^)- 
Combining inequality (23) and equality (24) gives the following theorem. 

Theorem 5.2. Under the assumptions of Theorem 5.1, 

\\f-nN,^in)<C.,m\ThT\\f\\M,^,.^in). 
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To allow for individual re-scalings in different stages, we start with some 
notation. Define ^k recursively as 

(25) 

for A; G N. For the kernel on step j, take 
(26) ^i = ^e7- 

We now develop a bound on || • H^r^ „^ (q) in terms of || • ||_y^ (q) . The basic 
assumptions on the re-scaling matrices @j in this section are that they are 
nonsingular and larger than the @j-i in the sense that Amax(0l_i0j-i"'v^j) ^ 
1, where H'- = G^ . 

In the case O = M , the native space 7V$q(M ) has norm defined through 
the inner product 



(27) (/,.W,,(M.) = (2^)-^/7 ^^ItT^^"' 

where / and g denote the Fourier transform and complex conjugate of the 
Fourier transform, respectively, of /, 5 G A/^g (M'^) [60]. This explicit repre- 
sentation of the native space inner product can be used to relate the native 
space norms for convolutions and re-scalings. Hereafter, take 00 > C2 > ci > 
and T with 

(28) uj'uj<i^'iy =^ f{uj)>f{u), cif{uj)<^{io)<C2f{uj). 

Assumption 1 ensures that ci, C2 and ^ satisfying (28) exist [60]. The bounds 
to follow are tightest for C2 — ci as small as possible. Essentially, we want 
a radially decreasing envelop on the Fourier transform of the underlying ker- 
nel <I> to simplify development. Note that the Fourier transforms ^ and $e 
are related in the following manner: 

l>©(w) = (2^)-'^/2 f $0(2;)e-*^'^d2; 

= (2^)-'^/2 f $(e2;)e-*^'-'®^dx 

= (27r)-'^/Vet(H)| / ^y)e-'^'^'y dy 

JR'' 



(29) 



= |det(S)|^(H^), 

where H' = G~^ and the third equality follows by making the substitution 
y = Qx. 
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Proposition 5.1. If Assumption 1 is satisfied and Qj-i,Qj are non- 
singular with respective inverses H'_ j^,S', then 

Amax(e'_iej_iH'Hj) < 1 

(30) 

2 < / ^ \ |det(^j-i) |„^„2 



M^^.^^iR") ^ y^J |det(H,-)|2 "-^ "^*,-i(«'') 

for I < j < J where c\ and C2 satisfy (28), and ^j-i and (^j satisfy rela- 
tions (25) and (26). 

Proof. If / ^ A/'ci,^._^(M'^), then ||/||^_^ ^j^^^ = oo and (30) is true. Now, 
assume / G A/'$._;^(]R'^), and note that 



oj'rl-EjiLi 



^ Amax(0i--10j-l'='7'='j)- 



If Amax(e;._iej_iH;.H,) < l, then 



uj'rJj'r.jOj < oj rJj^i'r.j^iUJ 



(31) 



-l>(Hja;) > t(Hjw) > t(Hj_ia;) > -i(Hj_iL^) 

Cl C2 

< , C2\ 1 



^{-jujf'-' ~ \cij ^{Ej.iujy'-' 

< , C2\ 1 



where the first implication follows from (28), the second implication follows 
since the right- and left-hand sides are positive and the final implication 
follows from the relations (25) and (26) and the properties of Fourier trans- 
forms of convolutions. So, 



2 

A/'*,_i(K'*) 



3- 

|2 



A^^J-U-i)(K') 



'i-i 



(2.)-/^ / ^ffl d. 
(2vr)"'^/2 ^ |/(a;)|2 ^^ 



|det(Hj_i)| V ^>^-(J-i)(Hj_ia;) 



22 B. HAALAND AND P. Z. G. QIAN 

(2vr)"'^/2 f |/(^)|2 ^^ 



|det(Hj_i)| V ^iJJ-j * ^J-J{Ej_ioj) 



2 



|det(Hj_i)| V ^'^-i(Hj_ia;; 

|det(=^_l)| y,j |det(=j)P*-'-J(2j_iai)2 

>(2.)-^i"°f^'ivay""7 \m d. 



^Z'^K/i^) 



kietMP /c,V'^-^"^„,„2 



|det(Sj_i)| Vc2y "-"w*.,* 



J '"J 



where the first equahty follows from relation (26), the second equality follows 
from the inner product representation (27), the third equality follows from 
the scaled Fourier transform relation (29), the fourth equality follows from 
the definition of '^•^~^^~^> (25), the fifth equality follows from the prop- 
erties of Fourier transforms of convolutions, the sixth equality follows by 
multiplying by |det(Hj)p/|det(Hj)p, the inequality follows from the devel- 
opment (31), the seventh equality follows from the scaled Fourier transform 
relation (29) and the properties of Fourier transforms of convolutions and 
the final equality follows from the inner product representation (27). D 

In most applications, the domain of interest is a strict subset of M"^. If 
/ e AA$,(0), then / can be extended to Ef &Mi^^{W^) [60] with 



(32) 



for all <^2- Combining (32) with Proposition 5.1 gives the following corollary. 
Corollary 1. If the assumptions of Proposition 5.1 are satisfied, then 

Amax(0,_10j-l"iSjr) < 1 

(33) 

llk,**,{n) < yfj |det(H~)|2 "^"^*.-i(f2)- 
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Proof. If f ^N^^_^{9.), then 



A/-.I, _^{n) 



oo and (33) is true. Now, 



assume / e A/'$^„i (0) and extend f to Ef £ A/^^.j (M'^) with ||£^/||^ ,^d) 



k,_,{n)-Then 



C2\ |det(r,j„i 



< 



ci 



|det(.,)P ll^^ll 



2 



3-1 



C2 \ |det(5-,_i) 



ci 



|det(H,)P 



AA*_,(f^)' 



where the first inequality follows from (32), the second inequality follows 
from Proposition 5.1 and the equality follows from the property of the chosen 
extension. D 

5.3. Error bound for multi-step interpolator. Combining Theorem 5.2 
with Corollary 1, we are able to obtain the following theorem bounding the 
native space norm of the multi-step interpolator's error. 

Theorem 5.3. Under the assumptions of Theorem 5.1 and Proposi- 
tion 5.1, 






<C*, 



n 



f V|det(H,_i)| 



Af^jin) 



Jll/ll^*o(n)ii<^^^ 



mih%f-'-' 



Proof. First applying Theorem 5.2 and then applying Proposition 5.1 
gives 



J 






<c$||ej||'/'4/' 



M.i.,{n) 



j-i 
f-E'P' 



A/"* ,4.,(n) 



<'r ll(=> i|fc/2^fc/2 \/|det(Hj„i) 



f-E'P' 



Afi,j_^in) 



For J > 2, repeat the above argument J —1 more times, and note that ^j~j 
has k2^ continuous derivatives. D 



By applying Theorem 5.1 to the error / — X]7=i 'P'' ^ ^^ additional multiple 



k/2 

of h-^ is obtained in the following theorem. 
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Fig. 3. Left panel: Franke's function. Right panel: log mean squared prediction error 
versus number of stages (circles) and using mlegp (asterisk). 



Theorem 5.4. Under the assumptions of Theorem 5.1 and Proposi- 
tion 5.1, 






<C^,j\\f\\M„,in)\\e.J&'h'J'^ 



n 



2 -[-r f V|det(H~ 



\\ |det(- 



mih'xr'''' 



6. Examples. First, consider using the multi-step procedure to interpo- 
late Franke's function 

fix, y) = I exp{-((9x - 2f + (9y - 2f)/A] 

+ f exp{-((9x + 1)2/49 - (9y + l)VlO)} 



+ iexp{-((9x-7)2 + (9y 
-iexp{-((9x-4)2 + (9y 



3)2)/4} 
7f)} 



shown in the left panel of Figure 3. Theorems 4.2 and 4.4 indicate that each of 
the nested data sets should have well-separated points in the full dimension 
as well as lower-dimensional projections to give small numeric error, and 
Theorem 5.4 indicates that each of the nested data sets should have small 
data- free regions in the full dimension as well as lower-dimensional projec- 
tions to give small nominal error. Training data are collected from Franke's 
function using a randomized (0,4, 2)-net in base 5 [38] with 5^ = 625 points, 
which has a convenient nested structure with both the full and each sub- 
design having small data-free regions and relatively well-spread points in 
both the full and projected space, making it ideal for the multi-step proce- 
dure. Theorem 4.4 indicates that a less smooth underlying kernel ^ will give 
more numerically accurate results, while Theorem 5.4 indicates that a more 
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smooth kernel will give more nominally accurate results. To balance these 
opposing forces in this moderately sized example, the selected $ is Wend- 
land's compactly supported kernel with four continuous derivatives [60], 



$(3; -y) = (t){y/{x-yy{x-y)), 

<^(r) = (1 - r)^+2[(^2 ^ 4^ ^ 3)^2 ^ (3^ + 6)r + 3], 1= [d/2\ + 3, 

and the rescaling matrices Qi,. . . , Qj are restricted to be diagonal, so each 
input is re-scaled separately. The re-scalings for each stage are chosen by 
leave-one-out cross-validation, for which a simple short-cut formula holds 
making computation undemanding for this moderately sized problem, al- 
though A^, ^, needs to be calculated. In particular, the ith cross-validation 

error at stage j is [47] 

(34) «W = 4' B^=^x],^ 



Bi 



J '^j 



In this example, the single-stage sample size is ni = 625, the two-stage sam- 
ple sizes are ni = 250 and n2 = 625, the three-stage sample sizes are ni = 
250, n2 = 375 and n^ = 625 and the four-stage sample sizes are ni = 250, 
722 = 375, na = 500 and n^ = 625. The nested data sets are Xj = {xi £ X ■.i< 
rij}. The right panel of Figure 3 shows the logarithm of the mean squared 
prediction error on a test set of 1,000 randomly generated uniform points. 
Notice that the mean squared prediction error is improved from 4.4 x 10~^ 
to 5.4 X 10~^. A Gaussian process fit using the mlegp package [7] in R, on 
the other hand, has mean squared prediction error 6.8 x 10^^. 

Next, consider using the multi-step procedure to interpolate Schwefel's 
function for d = 5 



f{x) = -^(1,0003;^ - 500)sin(Y'|l,000xj - 500|)/1,000, 
i=i 

a two-dimensional projection of which with the remaining variables fixed 
at 1/2 is shown in the left panel of Figure 4. This function is relatively 
complex and a very large training set is needed to build an accurate emula- 
tor. To ensure easy nesting and good space-filling properties for sub-designs, 
data are collected from Schwefel's function using a randomized (0,8,5)-net 
in base 5 with 5^ = 390,625 points. In this example there is a great deal of 
potential for numeric problems so Wendland's continuous, compactly sup- 
ported kernel, 

$(x -y) = 4>{\/{x-y)'{x-y)), 



r 



) = (l-r)V+^ l=ld/2\+l 
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10" 





Fig. 4. Left panel: two-dimensional projection of Schwefel's function. Right panel: log 
mean squared prediction error versus number of stages. 



with relatively little smoothness is selected. The re-scaling matrices 0i, . . . 
Qj are chosen to be fixed scalar multiples of the identity, Qj = 6jld, with 



(35) 



Vio7r((i/2 + i) 



which ensures that each intepolation matrix Ax ■ ,$ has less than 10^ nonzero 
entries. Edge effects in the five-dimensional cube ensure that the number of 
nonzero entries is substantially less than 10^. In this example, the single- 
stage sample size is ni = 390,625, the two-stage sample sizes are ni = 5'^ = 
78,125 and n2 = 390,625 and the three-stage sample sizes are ni = 78,125, 
722 = 2 X 5^ = 156,250 and 723 = 390,625. The nested data sets are Xj = {xi G 
X:i < rij}. The right panel of Figure 4 shows the logarithm of the mean 
squared prediction error on a test set of 10,000 randomly generated uniform 
points. Notice that the mean squared prediction error is improved from 0.11 
to 0.036. On the other hand, the mlegp package runs out of memory trying 
to fit a GP. 

7. Discussion. We have presented the intuitively appealing and practi- 
cally useful multi-step interpolation procedure. This procedure is easy to use 
and offers substantial improvements in overall accuracy in the emulation of 
large-scale computer experiments. We introduced a decomposition of the 
error of any interpolator into nominal and numeric portions. This decompo- 
sition is important because it allows the two sources of error to be analyzed 
separately while emphasizing the interplay between the two types of errors. 
We proved a very general result bounding the numeric error of a multi-step 
interpolator, of which an ordinary interpolator is a special case. This result 
constitutes the only complete and rigorous bound on the numeric error of 
the multi-step interpolator. We proved that in the situation where the earlier 
stage kernels are convolutions of the later stage kernels, substantial nominal 
improvements can be realized. In the context of the multi-step interpolator, 
this result is the most general and explicit of its kind. 
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Further work on the multi-step interpolation method will be explored in 
the following directions. First, its implementation details, along with var- 
ious examples, will be reported in a subsequent article, to illustrate the 
theoretical results derived here. The implementation of the method requires 
the generation of nested data sites, for which the typical choice in applied 
mathematics is nested grids. Nested space- filling designs [15, 41-43], origi- 
nally constructed for running multiple computer experiments with different 
levels of accuracy, are a better choice because of their good uniformity prop- 
erties. Such designs can be generated by exploiting nesting in orthogonal ar- 
rays [19], U designs [55, 56], orthogonal Latin hypercubes [3, 25, 26, 54, 62] 
or scrambled nets [38]. Second, emulation of computer models with qual- 
itative and quantitative factors is currently getting increasing attention 
[17, 45, 46]. We plan to extend the multi-step procedure to accommodate 
these two types of factors. Third, beyond emulation of computer experi- 
ments, singularity issues arise in fitting many other large kernel models. We 
plan to introduce a general multi-step framework for fitting kernel based 
classification and regression methods with a large number of observations. 
As in the multi-step interpolation procedure, this framework obtains nested 
data sites and then fits a kernel model in multiple steps, where in each step 
interpolation is replaced by an appropriate procedure for the given problem. 
New theoretical bounds on the nominal and numeric accuracy, analogous 
to those in Sections 4 and 5, will be derived for this framework. The re- 
quired well-spread nested data sites for the framework will be generated by 
using nested space-filling designs or the efficient thinning algorithm [12] for 
observational data. In the revision of this paper, we became aware of new 
theoretical developments of the multi-step method in applied mathematics, 
including [24] and [61]. 
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